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SUMMAKY 

We  now  have  the  query  facility  for  our  universal  relation  database  system  working-  Wo  have  published 
what  we  believe  to  be  tin*  fundamental  paper  unifying  ideas  on  what  a  Ul{  system  can  and  should  be. 
A  paper  surveying  developments  in  the  lield  of  universal  relation  systems  wsis  invited  for  the  triennial 
H  it'  Congress  and  was  delivered  in  September.  Some  initial  results  on  logical  theories  applied  to 
the  problem  of  updating  views  have  been  obtained.  There  have  been  a  number  of  developments 
concerning  inference  of  inclusion  dependencies  and  on  the  complexity  of  deciding  certain  properties 
of  database  schemes.  Some  interesting  results  on  the  dilliculty  of  obtaining  hash  functions  that  work 
well  for  particular  sets  of  data  have  been  obtained  and  won  an  award. 


I.  Systcm/U 

We  now  have  a  working  version  of  “Syslom/U,”  our  experimental  universal-relation  query  answering  system. 
A  translator  of  queries  to  parse  trees,  based  on  our  view  of  universal  relation  semantics,  has  been  working 
since  last  summer,  and  during  the  past  year  we  completed  the  optimization  phase  that  translates  these  trees 
into  an  ordered  set  of  steps  that  implements  the  query  elliricntly.  The  final  stage,  where  the  optimized 
sequence  of  steps  is  executed  on  files  that-  store  the  actual  database  relations,  was  implemented  on  lop  of 
I’K IS,  which  is  Steve  Reiss’s  (Brown  Univ.)  rola'ional  database  facility. 

A  description  of  the  system,  its  data  definition  language,  its  query  language,  and  the  important  algo¬ 
rithmic  ideas  used  to  implement  them  so  far  lias  been  compiled  [KKU]  and  submitted  for  publication. 

If.  Universal  Relation  Semantics 

The  paper  JMUYI j  was  published  in  a  recent  conference  proceedings,  and  an  expanded  version  |MUV2l 
has  been  accepted  for  TODS.  These  works  unify  the  various  assumptions  (hat  people  have  suggested  were 
necessary  to  make  universal  relation  systems  work.  We  identified  a  basic  assumption,  called  the  one  li.tvor 
assumption,  that  we  believe  is  essential  for  a  database  scheme  to  allow  meaningful  UR  queries,  and  we  believe 
that  this  condition  is  also  suifirieiit.  Briefly,  we  require  that  two  tuples  in  the  relation  produced  by  a  UR 
system  be  interchangeable,  in  the  sense'  that  if  there  are  several  paths  that  might  give  rise  to  tuples,  the  user 
doesn’t  care  which  path  was  actually  taken  to  produce  a  tuple.  Armed  with  this  viewpoint,  the  database 
designer  can  decide  when  attributes  must  be  split,  and  when  cycles  can  he  permitted  in  the  scheme.  Of 
course,  tin1  designer’s  judgement  is  needed  to  decide  when  the  one-flavor  assumption  is  violated,  but  design 
judgement  is  always  needed,  whatever  the  framework  in  which  a  database  is  designed. 

Beyond  the  fundamental  assumptions  like  (In’  one-flavor  assumption,  there  are  two  viewpoints  people 
have  taken  to  define  the  ’‘correct”  response  of  a  1  If  system. 

1.  (live  ail  algorithm  for  interpreting  queries.  System/ 1 ■  is  an  example  of  this  approach. 

2.  Define  an  abstract  universal  relation,  and  require  that  the  response  by  the  system  be  ;is  if  the  query 

Were  applied  to  this  one  relation  The*  work  of  Sagivf  is  an  example  of  this  approach. 

We  showed  in  [MUVI,  MUV2  the  follov  ing  equivalence  between  these  two  viewpoints.  If  there  is  any 

first  order  wav  (i.o.,  a  formula  in  relational  algebra)  to  produce  the  result  of  applying  the  query  to  the 
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re  presen  t  alive*  instance  (Sagiv’s  [S|  definition  of  the  abstract  Hit),  then  we  can  do  so  by  a  finite  union 
of  lossless  “tableau  mappings.’ ”  The  latter  are  essentially  expressions  using  lossless  joins  and  projections. 
Interestingly,  the  System/lJ  approach,  which  produces  unions  of  lossless  joins,  is  not  so  far  from  the  most 
general  possible  approach,  although  we  now  see  that  there  are  situations  where  we  shall  miss  generating 
certain  tuples  that  might  logically  deserve  being  generated. 

We  arc*  beginning  to  see  the  first  papers  on  IJK  semantics  that  were  written  with  support  of  the  grant 
appearing  in  journals.  Recently,  the  papers  [FMU]  and  [MU]  were  published.  The  ideas  behind  universal 
relation  systems,  after  having  endured  many  years  of  often  outright  hostility,  are  finally  being  recognized  as 
significant.  The  paper  [U]  was  invited  to  the  1983  IFII*  Congress,  and  a  rebuttal  written  by  Ullman  to  a 
logically  invalid  attack  on  the  UR  concept  that  appeared  in  TODS  in  1981  is  finally  to  appear  in  the  next 
issue  of  that  journal. 


m.  Logical  Databases  and  Updates  to  Views 

The  problem  of  implementing  updates  to  views,  of  which  the  universal  relation  is  a  special  case,  lias  received 
considerable  attention  recently.  We  believe  that  general  schemes  for  accepting  update  requests  about  (ictional 
relations  and  translating  them  in  an  understandable  and  justifiable  way  to  updates  on  the  actual  relations 
can  only  be  developed  after  one  lias  an  understanding  of  what  the  “meaning”  of  the  update  is.  We  have 
therefore  begun  consideration  of  logical  theories  jus  sets  of  facts  that  are  (explicitly  or  implicitly)  found  in 
the  database. 

In  [FUV]  we  set  down  a  viewpoint  in  which  databases  are  sets  of  facts,  presumably  including  the  facts 
stored  in  the  database,  and  possibly  including  some  facts  constructible  from  those  facts  and  present  in  one 
or  more  views.  We  also  proposed  a  particular  viewpoint  regarding  how  an  insertion  or  deletion  affects  the 
set  of  facts  in  the  database.  First,  when  deleting  a  Tact,  the  fact  should  no  longer  be  implied  by  thus  facts 
in  the  database,  a  point  that  seems  incontestable.  Next,  when  inserting  a  fact,  the  fact  should  then  be  in 
(not  just  implied  by)  the  database,  and  the  database  should  not  imply  the  negation  of  the  fact,  again  a  very 
reasonable  point  to  take.  Third,  we  wish  to  assume  that  the  database  change  is  minimal,  in  the'  sense  that 
we  do  not  delete  a  fact  that  could  just  jus  well  be  left  in  without,  contradicting  anything,  and  we  do  not  insert 
spurious  facts. 

The  major  debatable  assumption  we  make  is  that  we  do  not  delete  any  facts  unless  absolutely  forced 
to.  and  only  :is  a  second  priority  do  we  minimize  the  number  of  extra  facts  inserted.  We  are  not  wedded  to 
this  point  of  view,  but  we  like  it  on  the  grounds  that  the  database  represents  facts  that  the  users  believe, 
and  we  must  be  very  careful  about  throwing  them  away.  On  the  other  hand,  new  facts,  we  show,  need  only 
be  inserted  in  response  to  ;in  insertion  request  by  the  user. 

Some  interesting  consequences  follow  from  our  assumptions.  First,  we  discovered  that  it  is  essential  to 
make  a  distinction  between  the  actual  facts  in  the  database,  and  the  r/wurr  of  the  database,  i.e.,  the  facts 
that  follow  logically  from  those  in  the  dalalnuse.  This,  in  turn,  means  that  two  theories,  i.e.,  database  states, 
can  be  logically  equivalent,  in  the'  sense  that  each  statement  in  one  follows  from  statements  in  the  other,  and 
yet  have  dilTerent  properties  under  insertion  and  deletion. 

Example  1:  The  theories  7)  ==  {a,  6}  and  7*  --  {  a,  b,  a  V  6}  are  logically  equivalent,  since  a  V  b  follows 
logically  from  the  statements  a  and  b  in  Ti,  and  all  other  members  of  either  theory  are  present  in  the  other. 
However,  if  we  delete  a  and  then  delete  b,  from  both  theories,  7\  becomes  empty,  while  7\  retains  a  \f  b,  so 
the  I  wo  theories  become  logically  incqtiivalciil.  \ 
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It  may  soon i  bizarre  that  a  V  b  can  remain  true  after  we  deleted  a  and  we  deleted  b.  However,  tins 
example  points  up  several  facts. 

1.  The  meaning  of  a  deletion  is  only  that  we  are  no  longer  sure  the  fact  is  true.  If  we  wanted  to  say  that 
a  is  absolutely  not  true',  we  would  insert  ->a,  which  is  different  from  deleting  a. 

2.  Certain  derived  facts,  like  a  V  b  apparently  play  the  role  of  an  integrity  constraint,  i.c.,  it  is  saying  that 
at  alt  times,  either  a  is  true  or  b  is  true.  We  must  not  put  them  in  the  database  explicitly  unless  we 
mean  them. 

3.  As  a  generalization  of  (2),  the  details  of  what  statements  are  in  the  database  matters,  even  if  the 
statements  in  question  are  logical  consequences  of  others  in  the  database. 

The  second  major  point  from  [PUVJ  is  that  deletions,  rather  than  insertions,  are  central,  since  is  shown 
that  insertion  of  statement  «  is  equivalent  to  deletion  of  ^ s  followed  by  adjoining  n  to  the  database.  In 
general,  there  will  be  more  than  one  theory  that  can  he  obtained  by  a  minimal  change,  since  several  changes 
may  be  incommensurate. 

Example  2:  Suppose  our  database  includes  Employee-Department  facts  and  Department- Manager  facts, 
and  there  is  an  Kin  ployed  Manager  view  formed  from  the  latter  two  relations  by  the  obvious  composition. 
Then  if  Jones  is  in  the  Toy  Dept.,  and  that  department  is  managed  by  Smith,  there  is  the  derived  fact  that 
Smith  manages  Jones,  which  may  or  may  not  be  explicitly  in  the  theory,  depending  on  our  specific  update 
rules.  If  someone  says  “delete  the  fact  that  Smith  manages  Jones,”  then  we  are  left  with  two  dilfcrcnt 
minimal  changes: 

1.  Delete  the  fact  that  Jones  is  in  the  Toy  Dept. 

2.  Delete  the  fact  that  Smith  manages  the  Toy  Dept. 

The  viewpoint  taken  by  [PUV]  is  that  the  actual  database  should  be  adjusted  to  be  logically  equivalent 
to  the  “or”  of  these  two  possible  worlds.  That  is,  the  new  database  consists  of  the  single  fact 

“Jones  is  in  the  Toy  Dept,  or  Smith  manages  the  Toy  Dept.” 

assuming  there  were  no  other  facts  to  begin  with.  Notice  how  we  hold  onto  the  strongest  fact  that  does  not 
imply  the  deleted  fact,  yet  can  be  built  from  facts  formerly  in  the  database. 

Example  3:  As  an  example  of  the  [PUV]  rule  for  combining  theories,  let  1\  —  {a,  6}  and  T-i  =  {a,c, d}. 
Then  Tt  V  7a  =-=  {a,  a  V  c,  a  V  d,  b  V  c,  b  V  d}.  Note  that  statements  like  a  V  b  are  kept  in,  even  though 
implied  by  the  statement  a,  which  is  really  a  V  a,  the  t  wo  a’s  being  taken  from  the  two  theories.  This  aspect 
of  the  definition  is  essential  for  certain  results  to  go  through,  it  seems. 

IV.  Complexity  Issues  and  Dependency  Theory 

There  has  been  significant  progress  in  the  development,  of  algorithms  for  reasoning  about  functional  depen¬ 
dencies  and  inclusion  dependencies.  The  latter  are  dependencies  that  say  an  entry  in  one  or  more  columns 
of  one  relation  must  also  appear  in  designated  columns  of  another  (perhaps  the  same)  relation.  Typical  con¬ 
straints  of  this  form  are  “every  Manager  is  an  Employee”,  or  “if  the  department  d  is  mentioned  in  the  EMPS 
relation,  then  there  is  also  a  tuple  for  d  in  the  DKPTS  relation.”  Punctional  and  inclusion  dependencies  arc 
by  far  the  most  common  forms  of  dependencies  found  in  practice,  yet  while  the  former  arc  well  understood, 
the  latter  have  been  largely  ignored  by  the  theory,  and  their  interaction  has  hern  unknown. 

In  [K(IV|,  the  interaction  between  PIVs  and  unary  inclusion  dependencies  (those  involving  the  contain¬ 
ment  of  a  single  attribute  in  another  by  far  the  most  common  case)  was  uncovered,  and  an  efficient  algo- 


rithivi  for  (helming  all  consequences  was  given.  However,  for  FIVs  and  binary  inclusion  dependencies,  there 
is  no  algorithm  to  liml  the  consequences  [CV]. 

Further  explorations  have  been  made  into  a  number  of  other  areas  of  dependency  theory.  [GMV] 
discusses  what  it  means  for  databases  (sets  of  relations)  to  satisfy  dependencies.  This  issue  is  of  importance 
for  UR  semantics,  because  we  presume  that  any  (imaginary)  universal  relation  will  satisfy  given  dependencies 
in  some  sense.  [CV]  shows  how  to  test  consistency,  one  of  the  notions  from  [CMV],  in  polynomial  time, 
provided  the  dependencies  are  “total,”  i.e.,  they  apply  to  the  UR  as  a  whole. 

V.  Properties  of  Acyclic  Database  Schemes 

We  have  long  felt  that  the  “acyclic”  database  schemes  played  a  fundamental  role  in  design  of  databases.  In 
past  years  we  reported  a  large  number  of  useful  properties  possessed  by  these  schemes  but  not  by  cyclic  ones. 
The  basic  query  answering  strategy  of  System/U  depends  on  decomposing  the  database  scheme  into  acyclic 
subschemes. 

The  paper  [BV]  shows  another  useful  property  of  acyclic  schemes.  Specifically,  for  these  schemes  (but 
not  in  general),  the  natural  join  is  the  way  to  reconstruct  the  universal  relation  whenever  a  unique  universal 
relation  exists.  This  is  further  confirmation  that  we  have  adopted  the  correct  approach  to  answering  queries, 
since  we  are  assured  that  our  system  will  construct  the  universal  relation  correctly  and  answer  the  query  as 
if  it  were  asked  about  that  universal  relation. 

VI.  Efficient  Retrieval 

In  [M],  which  won  the  Machtey  Prize  for  the  best  student  paper  at  the  upcoming  IKKF  Sy in p.  on  Foundations 
of  Computer  Science,  the  issue  discussed  is  the  tradeoff  between  the  performance  of  hashing  functions  (how 
many  collisions  they  induce  ori  particular  sets  of  data)  and  the  program  complexity  of  the  funct  ions,  i.e.,  how 
long  the  program  computing  the  hash  function  must  be.  A  variety  of  hashing-like  schemes  are  discussed, 
including  hashing  with  secondary  chains  and  hashing  into  blocks. 

In  addition  to  getting  precise  bounds  on  how  long  the  typical  program  must  be  for  each  of  these  schemes, 
the  paper  and  doctoral  thesis  concludes  that  there  is  no  advantage  to  a  scheme  in  which  pointers  are  used  tc 
form  chains,  when  compared  with  schemes  that  calculate  addresses  directly.  The  best  sort  oT  scheme  appears 
to  be  one  where  the  hash  function  gives  the  address  of  a  block  of  memory  where  the  datum  in  question  may 
be  stored,  and  binary  search  within  the  block  is  used  to  find  the  datum  if  it  exists.  This  observation  about 
pointers  appears  to  be  borne  out  in  practice*,  even  when  the  data  set  is  changing  rather  than  fixed. 
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